P015866US 



IN THE UNI TED STATES PATENT A>JD TRADEMARK OFFICE 
APPUCATION PAPERS 
OF 

WILCODUKSTRA 
FOR 

ADDRESS GENERATION 



P015866US 1 

BACKGROUND OF THE INVENTIQN 

Field of the Invention 

The present invention relates to address generation and in particxilar to address 
generation in a data processing apparatus. 
5 Description of the Prior Art 

Address generators for data processing apparatus are known. One such data 
processing ^aratus is shown in figure 1. The data processing apparatus, generally 5, 
comprises a processor core 10 arranged to process instructions received from a memory 
20 via a bus interface unit (BIU) 50. Data required by the processor core 10 for 
1 0 processing those instructions may also be retrieved fix>m the memory 20 via the BIU 50. 
A cache 30 is provided for storing data values (which may be data and/or instructions) 
retrieved &om the memory 20 so that they are subsequently readily accessible by the 
processor core 10. A cache controller 40 controls the storage of data values in flie cache 
30 and controls the retrieval of the data values fiom the cache 30. 
15 The processor core 10 is a pipelined processor which enables multiple 

instractions to be in the process of being executed at the same time. Rather than 
having to wait until the execution of one instruction has fully completed before 
providing the next instruction to the processor core 10, a pipelined processor is able to 
receive new instructions into the pipeline whilst other instructions are still in the 
0 process of being executed at subsequent pipeline stages of the pipeline, thereby 
significantly improving performance. 

To fiirther improve the performance of pipelined processors, it is known to 
provide the processor core 10 with multiple pipelines, as illustrated in figure 2, rather 
than just a single pipeline. As shown, a pipeline may be provided for dealing with 
> load and store instructions, whilst a different pipeline may be provided for dealing 
with arithmetic instructions. Typically, the first part of each pipeline is unified such 
that there is a common pipeline for the earlier stages of instruction execution, such as 
the fetch and decode stages 60, 70. Thereafter, the pipeline splits with arithmetic 
instructions being executed by one or more first execution stages 80 and load and 
store instructions being executed by a second execution stage 90 and then one or more 
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memory stages 100. The pipelines may then become unified again for the last stage of 
instruction execution at the write-back stage 110. 

It will be appreciated that the speed at which the processor core 10 can run is 
limited ultimately by the critical path of instructions through the various stages. The 
5 time taken to execute operations on the critical path will affect the speed at which the 
processor core 10 can run. Due to the relatively slow access time, instructions causing 
memory accesses will invariably be on the critical path. Hence, any techniques to 
reduce the access time such as reducing the time taken to generate flie address for 
memory accesses will usually enable the processor core 10 to be run more quickly. 
1 0 The memoiy address required for such accesses are generated by the second execution 
stage 90. Accordingly, to further improve the perfomiance of the pipelined processor, 
the second execution stage 90 is optimised for memory address generation for loads 
and stores to the cache 30. 

The processor instruction set may define a niunbo: of different instructions that 
can be used to generate such loads and stores. TypicaUy, in ARM (trademark) 
architectures, five different load (and coiresponding store) instructions are supported. 
The load instructions comprise: 

a) LDR Ra, [Rb, #1] (i.e. load into the registrar Ra the value stored in die 
mcmoiy address referred to by adding the immediate T to die contents of the register 
Rb; the immediate T may be a positive or negative integer); 

b) LDR Ra, [Rb, Rc] (i.e. load into the register Ra the value stored in the 
memoiy address refenred to by adding the contents of the register Rb to the contents of 
the register Rc); 

c) LDR Ra, [Rb, -Rc] (i.e. load into the register Ra the value stored in die 
memory address refened to by subtracting the contents of the register Rc fiom die 
contents of the register Rb); 

d) UDR Ra, [Rb, Rc, LSL/R#N] (i.e. load into die register Ra the value stored in 
die memoiy address referred to by adding die contents of the register Rb to the contents 
of die raster Rc when subjected to a logical shift lefl/right by N bits); and 
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e) LDR Ra. [Rb, -Rc. LSURifN] (i.e. load into the register Ra the value stored in 
the memoiy address referred to by subtracting the contents of the register Rc when 
subjected to a logical shift left/right by N bits ftom the contents of the register Rb). 

As mentioned above, five con-esponding store instructions are also supported. 
Hence, it will be appreciated that the second execution stage 90 is required to 
support both addition and shift operations for positive and negative operands. It wiU 
also be appreciated that shift operations provide a convenient mechanism when it is 
desired to perform, for example, a multiplication operation on the contents of the Rc 
register. 

Accordingly, as illustrated by Figures 3A and 3B, a prior art address generator 
120 in the second execution stage 90 incorporated a combined adder and shift 
.ftmction. 

Typically, a value such as the immediate 'I' or the contents of the register Rc 
are provided on the X input. From this X input, the value of the X input (X) and the 
inverse of the X input (X) are provided to a multiplexer 130. Also, the X input 
logically shifted a number of bits left or n0A (Xu,y^^,.^) is generated by a shifter 
1 35 and th e inverse of the X input logically shifted a number of bits left or right 
(Xu5i;r(#,.n) ) is also provided by an inverter to the multiplexer 130. Typically, where 
the operands are, for example, 32 bit, the X input may be shifted by any number fiom 
1 to 31 bits left or right in order to generate the (Xisv^^uu)) output. On the Y input, 
the contents of the Rb register are typically provided to an A input of an adder 140. 

The multiplexer 130 selects one of the inputs to be provided to the B input of 
the adder 140 dependent on the instruction. The adder 140 then adds togeflier the 
contents provided on the A and B inputs and generates the output A+B. It will be 
appreciated that for addition operations the carry input C will be set to a logical '0'. 
whereas for subtraction operations the cany input C will be set to a logical '1'. 
Through this approach the address generator 120 can provide the fiinctionality to 
support all the instructions mentioned above. 

However, it was found that providing this degree of functionality within this 
30 stage of the pipeline meant that the time taken for this stage to process the instruction 
was relatively long which, because this stage is on the critical path, limited the speed 
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at which instructions could be clocked through the pipeline. The reason that the 
execution speed of this address generator 120 was relatively slow was due to the time 
taken by the shifter 135 to generate the (Xisuih#i.n)) and (X^^^.^,) operands. 

To address this problem, a subsequent arrangement of address generator 150 
5 split the adder and shift Auctions into two different logic units 160, 170, as illustrated 
by Figure 4. 

Any shift operation is performed by the shifter 135 in the shift logic unit 160 
prior to the add operation. Hence, the output provided by the shift logic unit 160 may 
be the X input (i.e. X) or the X input shifted left or right by any number of bits (i.e. 
1 0 XisuR(m -N)), dq)endent on the selection made by the multiplexer 137. The adder logic 
unit 170 retains the ftmctionality to provide the inverse of an operand, dependent on 
the selection made by the multiplexer 130. Hence, the adder is stiU able to receive on 
input B. the X input, the inverse of the X input (X), the X input logically shifted a 
number of bits left or right (Xisumn-N)) and the inverse of the X input logically 
15 shifted a aumbc «*'Vittc nr nnht f y •i 

By separating these functions, the time taken for each logic unit 160, 170 to 
process instructions is reduced. Hence, the speed at which instructions can be clocked 
through the address generation stage 150 is increased. Typically, each logic unit 160, 
170 was found to take about half the time to process instructions as the arrangement in 
20 Figures 3A and 3B. Accordingly, the speed at which instructions are clocked through 
the pipeline could be increased by about a fector of two. 

However, it will be ^preciated that a problem with this arrangement is that all 
instmctions must be routed through both the shift and adder logic units 160, 170, 
irrespective of whether a shift operation is to be performed or not. 
25 Accordingly, to address this problem a fiirther address generator arrangement 

was devised as illustrated in Figure 5. 

In this arrangement only those instructions which require a shift operation are 
routed by multiplexers 155, 165 through the shift logic unit 160, with aU other non- 
shift instructions being routed by multiplexers 155, 165 directly to the adder logic unit 
30 170. 
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Accordingly, the time taken to process shift instructions remains unchanged in 
comparison with the arrangement in Figure 4 since these instructions are still routed 
through both the shift and adder logic units 160, 170. However, the time taken by the 
address generator 180 to process non-shift instructions is significantly reduced in 
5 comparison with the prior arrangements since these instmctions need not be routed 
through the shift logic unit 160, which takes additional clock cycles, but may instead 
by processed directly by the adder logic unit 170. It will be appreciated fliat such an 
approach can increase the overall performance of the processor core 10 when shift 
operations occur infiequently. 
10 It is desired to fiirther increase the paformance of the processor core 10 when 

processing instructions. 

SUMMARY OF THE JNVENTTQN 
According to a first aspect, the present invention provides a data processing 
apparatus comprising: a processor core operable to process a sequence of instructions, 
15 the processor core having a plurality of pipeline stages, one of the plurality of pipeline 
stages being an address generation stage operable to generate an address associated with 
an instruction for subsequent processing by the pipeline stages, the instmction beiag one 
from a first group of instructions or a second groiq> of instructions, the address 
generation stage comprising: address generation logic operable to receive operands 
20 associated with flie instruction, to generate a shifted operand Scorn one of the operands, 
and to add together, in dependence on the instmction, selected of the operands and the 
shifted operand to generate the address for subsequent processing by the pipeline stages; 
and operand routing logic operable, in dependence on the instruction, to route operands 
associated with instructions &om the first group of instructions to the address generation 
25 logic and to route operands associated with instructions fiom flie second group of 
instructions via operand manipulation logic for manipulation of the operands prior to 
routing to the address generation logic. 

The present invention recognises that during topical processing of instructions 
by the data processing apparatus, the occurrence of instructions which require one 
30 particular shift operation fi-om the set of all possible shift operations has been found to 
be almost equally as high as those which do not require that shift operation. 
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Accordingly, address generation logic is provided which can perform both that 
particular shift operation, when required, as well as an addition operation on the 
operands of instructions. Providing address generation logic which can perform the 
shift operation as well as an addition operation enables both of these operations to be 
5 performed by the same logic without the need to always pass those instructions to 
other logic for handling, such as previously occurred for those instructions requiring 
flie shift operation. It will be q)preciated that because instructions inquiring the shift 
operation do not need to be passed to other logic for handling, the time taken to 
process these instructions is significantly reduced and, hence, the performance of the 
10 pipelined processor when processing such instructions is significantly increased. 

The present invention also recognises that the processing speed of the address 
generation logic can be limited by the speed of the logic which selects between the 
operands in order to generate the address. Hence, in order to prevent the operating 
speed of the address generation logic finom increasing, the fiirther fimctionahty 
required to perform the shift operation cannot simply be added to the existing 
fimctionality of the prior art adder logic since this would slow the operation of this 
logic. Hence, so as to not slow the operation of the address generation logic, the logic 
previously provided to generate inverse operands is removed and replaced with the 
logic required to support the shift operation. Because the time taken to generate just 
20 one particular shift operand from an operand associated with the instruction is 
relatively small, the address generation logic may always perform this shift operation 
without increasing the time taken by the address generation stage. The address 
generation logic may then select the appropriate combination of original operands or 
shifted operand for addition dependent on the instruction to be perfonned. Also, 
25 because the number of operands to be selected by the address generation logic has not 
increased, no increase in the time taken to select between operands occurs. 

The operand manipulation logic may be provided separately and only those 
instructions which require this fimctionality are routed through this logic. It will be 
appreciated that routing instructions through this separate logic increases the time 
30 taken by the address generation stage to generate the address for these instructions. 
However, the present invention fijrther recognises that the occurrence of instructions 
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which need to be routed via the operand manipulation logic is relatively low in 
comparison to instructions which require the shift and additive operations. 
Accordingly, not only is the overaU performance of the address generation stage not 
impacted by the provision of separate operand manipulation logic, the performance is 
5 in fact significantly increased because the most frequently occuning shift operation 
can now be dealt with directly by the address generation logic without the need to 
incur the performance hit of routing these instructions to separate logic to deal with 
shift operations. 

Hence, it will be appreciated that the overall performance of the address 
10 generation stage is increased because the address generation logic has been optimised 
to handle only those most frequently occurring instructions. By only handling the 
most common shift operation and addition operations, the fimctionality required to be 
provided by the address generation logic can be minimised which, in turn, maximises 
the speed at which that this logic can operate. 
15 In one embodiment, the inslniction relates to a memory access and the address 

indicates a location in memoiy to be accessed. 

Hence, it will be appreciated that this arrangement is particularly suited to the 
processing of instructions which generate locations in a memory associated with the data 
processing ^paratus to be accessed. 

In one embodiment, the first group of instractions comprises a first instraction 
v/bich causes the processor core to logically add together two operands, and a second 
instraction which causes the processor core to logically add together one operand to 
another operand logically shifted by one of a predetennined number of bits. 

In one embodimait, the address generation logic is operable to generate the 
another operand logicaUy shifted by one of a predetennined number of bits. 

Hence, the address generation logic can generate addresses for fliose instractions 
which require an addition of the two operands associated with the instraction or can 
generate addresses for those instractions which require one operand to be added to 
another operand which is shifted by a preset particular number of bits. 

In one embodiment, the second instraction causes the processor core to logically 
add together one operand to another operand logicaUy shifted left by two bits. 
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The logical shift left by two bits operation has been found typically to be the 
most commonly-occurring shift operation. 

In one embodiment, the address generation logic is operable to generate the 
another operand logically shifted left by two bits. 
5 Hence, the address generation logic is optimized to handle the most ftequently 

occurring shift instruction which requires the operand to be logically shifted left by two 
bits. 

In one embodiment, the second instruction causes the processor core to logically 
add together one operand to another operand subject to only one preset logical shift 
10 operation. 

By limiting the functionality of the address generation logic to handle an 
instruction requires just one preset logical shift operation, the size of the logic can be 
reduced such that it can operate at hi^ speed. 

hi one embodiment, the address generation logic is operable to perform only one 
15 predetermined logical shift operation and operands associated with aU other logical shift 
operations required by instructions fiom said second group of instructions arc routed via 
operand manipulation logic for manipulation of operands prior to routing to the address 
generation logic. 

Hence, aU insliuctions other than instructions for which the address generation 
logic is optimized are passed to the operand manipulation logic. The operand 
manipulation logic may then generate the necessary operands in a fonn that is suitable 
for handling in an optimized way by the address generation logic. 

hi one embodiment, the second group of instructions conqwises instructions 
which cause the processor core to logicaUy add together one operand to another operand 
25 subject to any other logical shift operation. 

hi one embodimait. the operand manipulation logic is operable, in dependence 
on the instruction, to generate the anotho- operand logicaUy shifted by any other number 
ofbits. 
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Hence, the operand manipulation logic can generate an opaand shifted by any 
number ofbits which can then be siq>pUed to the address generation logic. Although 
generating these shifted operands can take a relatively long amount of time, because the 
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frequency at which such opCTands are geaerated is relatively low, the overall 
performance of the address generation stage still remains significantly higher than prior 
art airangements. 

In one embodiment, the second gcoiq> of instructions comprises instructions 
5 which cause the processor core to logically subtract one q)erand from another operand. 

In one embodiment, the operand manipulation logic is operable, in dq)endence 
on flie instruction, to generate an inverse rqnesentation of one of the operand and the 
anoflier qierand. 

Hence, flie operand manipulation logic can generate an inverse operand which 
10 can then be supplied to the address generation logic. Although passing operands to the 
operand manipulation logic to generate tiiese inverse operands can take a relatively long 
amount of time, because the frequency at which such operands are generated is relatively 
low, the overaU performance of the address genraation stage still remains significantly 
higher than prior art amngemeats. 

In one embodiment, the second group of instructions comprises a subtiactive 
instruction for which flie address is generated by subtracting a subtrahend operand from 
a minuend operand associated with the instruction, and the operand manipulation logic 
comprises subtraction operand generation logic operable to generate a negative 
representation of the subtrahend operand prior to routing to the address generation logic. 
20 Hence, in such embodiments, the logic that was previously provided in the 

address generation logic to support subtraction operations is removed and replaced 
with logic required to support shift operations, and the subtraction operand generation 
logic is provided separately. Only instructions which involve a subtraction need by 
routed through this separate logic. Whilst routing instructions through this separate 
25 logic increases the time taken by the address generation stage to generate the address, 
it has been found that the occurrence of instructions involving a subtraction is 
relatively low in comparison to instructions which require a shift and additive 
operation. Accordingly, as mentioned above, not only is the overall performance of 
the address generation logic not impacted by the provision of sqjarate subtraction 
30 operand generation logic, flie perfonnance is in fact significantiy increased because 
the more frequentiy occurring shift operation can now be dealt with directly by the 
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address generation logic without needing to be delayed due to routing these 
instructions through separate logic to deal with the shift operation. 

In one embodiment, the address generation logic comprises: operand generation 
logic operable to receive a first operand associated with the instruction and to generate a 
5 shifted operand rq,resentative of the first operand shifted by a predetermined number of 
bits; operand selection logic operable, in dependence on the instruction, to select one of 
the first operand and the shifted operand as a selected operand; and addition logic 
operable to add a second operand associated with the instruction to the selected 
operand to generate the address for subsequent processing by the pipelined stages. 
10 In such embodiments, the address generation logic receives a number of 

operands associated with the instruction. For instiuctions in the first group of 
instmctions. these operands may be those operands which are tiie subject of the 
instruction. For instiiK^tions in flie second group of instructions these operands may be 
one or more operands which are the subject of the instniction in addition to any 
operands which may be generated by the operand manipulation logic. The one^nH 
manipulation logic receives one of these operands and performs the shift operation by 
generating a shifted operand. Operand selection logic then selects either the first 
operand or the shifted operand to supply to the addition logic. The decision of which of 
the first operand or the scalar operand to select is made based on the instruction itself. 
The addition logic flien receives the operand fiom the operand selection logic and adds 
this to a second operand to generate the required address. 

In one embodiment, flie first operand comprises 'n'-bits. where «n' is a positive 
integer, flie operand generation logic receives die first operand over an V-bit input bus 
and provides the shifted operand on an 'n'-bit ou^ut bus. ibe operand generation logic 
comprising: intercomiection logic operable to couple lines of flie 'n'-bit input bus wifli 
lines of flie 'n'-bit output bus to perform flie shift operation. 

Hence, flie shift operation can be performed by haid-wiring flie bus to present 
fl« bits of flie operand in a new order. It will be appreciated fliat such an approach is 
fast and ensures fliat no undue delay is intioduced in flie address g«ieration stage. 
In one embodiment, flie operand selection logic is a two-input multiplexer. 
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By providing two inputs to the operand selection logic, the operating speed of 
this logic is maintained. 

In one embodiment, the operand selection logic is operable to select one of the 
first operand and the shifted operand as a selected operand in response to a selection 
5 signal generated by instruction decoder logic. 

The instruction decoder logic is typically provided in an earlier decode stage of 
the pipeline. This logic typically generates a number of control signals, in dependence 
on the instruction being processed, for use by the pipeline and other elraients of the data 
processing apparatus. One such control signal maybe a selection signal which is used by 
10 the operand selection logic to ensure that the correct operands are selected during the 
address generation stage. By using pre-generated signals during this selection, it will be 
appreciated that no undue delay is introduced at address generation stage which may 
otherwise occur should a detemiination need to be made by that stage regarding which 
operands to select. 

15 In one embodiment, the addition logic is a two operand adder. 

Tn one embodiment, the operand routing logic is operable to route operands in 
response to a routing signal generated by instruction decoder logic. 

As mentioned above, the instruction decoder logic is typically provided in an 
earlier decode stage of the pipeline and generates a number of control signals, in 
20 dependence on the instruction being processed. One such control signal may be a 
routing signal which is used by the operand touting logic to easare fbat the operands are 
routed either directly to the address generation logic, or via the operand manipulation 
logic. By using pre-generated signals during this routing, it will be ^jpreciated that no 
undue delay is introduced at the address generation stage which may otherwise occur 
25 should a determinaticm need to be made by that stage regarding which logic to select. 

In one onbodiment, the instruction is a subtraction instruction which causes the 
processor core to generate the address by subtracting a subtrahend operand in the fonn 
of an immediate fiom a minuend operand, and the data processing apparatus comprises 
inshaiction decoder logic operable to provide the subtrahend operand in negative fomi to 
JO the address generation stage and to generate a routing signal to cause the operand 
routing logic to route operands to the address generation logic. 
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As mentioned above, the instruction decoder logic is typically provided in an 
earUer decode stage of the pipeline. During this stage a number of operations typicaUy 
need to be performed when decoding the instruction. It has been found that it is possflile 
to generate immediates in positive or negative form in paraUel with the instruction 
5 decoding without increasing the time taken by that stage. Accordingly, such negative 
immediates can be generated by the decode logic and provided in the negative form to 
the address generation stage. Because the immediate is ah«ady in negative fomi, there is 
no need to invoke the operand manipulation logic. Accordingly, instnictions utilising 
negative immediates can be treated as additive instructions and the negative immediate 
10 can be routed directly to the address generation logic. It will be appreciated that this 
further improves the perfomiance of the address generation stage. 

In one embodiment, the instruction is one of a load instruction and a store 
instniction. 

According to a second aspect of the present invention there is provided in a 
data processing apparatus comprising a processor core operable to process a sequence 
of instructions, the processor core having a plurality of pipeline stages, one of the 
pluraHty of pipeline stages being an address generation stage operable to generate an 
address associated with an instruction for subsequent processing by the pipeline 
stages, the instruction being one from a first group of instnictions or a second group of 
instractions, a method of generating the address comprising the steps of: receiving, at 
address generation logic, operands associated with the instruction; goierating a shifted 
operand from one of the operands; adding together, in dependence on the instniction. 
selected of the operands and the shifted operand to generate the address for 
subsequent processing by the pipeline stages; routing, in dependence on the 
instruction, operands associated with instructions fix>m the first group of instructions 
to the address generation logic; and routing, in dependence on the instruction, 
operands associated with instructions from the second group of instnictions via 
operand manipulation logic for manipulation of the operands prior to routing to the 
address generation logic. 
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BRIEF DESCRIPTIQN OF THE DRAWTNflS 
The present invention will be described fUrther, by way of example only, with 
reference to a prefened embodiment thereof as illustrated in flie accompanying 
drawings, in which: 
5 Figure 1 illustrates elements of a data processing apparatus; 

Figure 2 illustrates an exan^le arrangement of pipeline stages in a pipelined 
processor; 

Figures 3 A and SB illustrate a prior art arrangement of one stage in the pipelined 
processor 

10 Figure 4 illustrates a subsequent prior art arrangement of one stage in the 

pipelined processor. 

Figure 5 illustrates a yet further prior art arrangonent of one stage in fee 
pipelined processor; 

Figures 6A and 6B illustrate an atrangonent of one stage in ths pipelined 
1 5 processor according to an embodiment of the preset invention; and 
Figure 7 illustrates elem^ts of a decode stage. 

DESCRIPTIQN OF A PREFERRED EMBQDTlVfKNT 
Figures 6A and 6B illustrate the arrangement of elements of an address 
generation stage 200 of a pipelined processor in accordance with an embodiment of the 
10 present invention. The address generation stage 200 is optimised to handle the most 
commonly occurring instructions (i.e. addition operations with or without a particular 
predetermined shift operation) in a minimal time, whilst more infrequently occurring 
instructions (i.e. those requiring the generation of a negative operand and/or all other 
shift operations) take longer to process. By optimising the address generation stage 200 
5 to handle the most commonly occurring instiuctions more quickly than fliose which 
occur less frequently, the overall perfomMnce of die address generation stage 200 is 
improved. 

The reason why fee generation of a negative operand occurs infrequentiy can be 
explained as follows. The address generation stage 200 is required to generate addresses 
) of data values to be accessed from locations in a memory in fee course of the processor 
executing an inshuction. It is common practice when addressing memory to define a 
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base address and then to access other addresses which are offset fiom that base address. 
When accessing memory in this way it is very common to generate an instruction which 
results in an address being generated in the form of address Q plus some of&et O (which 
may be a number of bytes or words) to access a location a number of bytes or words 
5 incremented fiom the base address. Equally, it is very common to generate an 
instruction which results in an address being gen^ted in the form of address Q plus 
some oflSet O (which may be a number of bytes or words) multiplied by P (which is a 
positive integer), also to access a location a number of bytes or words incremented Srom 
the base address. However, it is very uncommon when accessing memory in this way to 
10 generate an instruction which results in an address being generated in the form of 
address Q minus some offset O (which may be a nimiber of bytes or words) to access a 
location a number of bytes or words decremented fiom the base address. Equally, it is 
veiy uncommon to generate an instruction which results in an address being generated in 
the form of address Q minus some of&et O (which may be a number of b)4es or words) 
1 5 multiplied by P (which is a positive integer), also to access a location a number of bytes 
or words decremented fiom the base address. This is because it would be normal 
practice instead to simply change the location of the base address. 

As shown in Figure 6A, the address generation stage 200 includes address 
generation logic 220 (which is arranged to selectively add together operands as well as 
20 to perform one predetermined shift operation, as will be described in more detail below), 
inversion logic 210 (which is arranged to generate an inverse or negative or 
complementary representation of an operand), shift logic 216 (which is arranged to 
perform every possible shift operation) and routing logic in the form of multiplex^ 205 
and 215. 

25 The address generation stage 200 receives operands associated with the 

instruction to be processed. In Axis arrangement, the operand Q representmg the base 
address is provided directly to the address generation logic 220 over the path 255, whilst 
the operand O representing the offset is provided over the path 256. The shift logic 216 
operates to provide any required shift operation on the oflSet operand and to provide that 

30 shifted operand to the multiplexer 205. The mversion logic 210 operates to provide an 
inverted rqn:iesentation of the operand output by the multiplexer 205 and to provide that 
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inverted operand to the multiplexer 215. Accordingly, it will be ^preciated that the 
operand representing the offset can be routed as appropriate by the multiplexes 205 and 
2 1 5 for manipulation prior to being provided to the address generation logic 220. 

The operation of the logic which manipulates flie ofi&et operand prior to being 
5 provided to the address generation logic 200 will now be e:q}lained in more detail. The 
multiplexer 205 receives a routing signal R over path 203, the multiplexer 215 receives 
a routing signal T over path 208. The routing signals R and T are generated by decode 
logic in a decode stage earlier in the pipeline, as will be described in more detail below. 
The routing signals R and T are generated in dependence on the instruction 
10 being processed. The multiplexer 205 is controlled using the routing signal R and 
operates to select between the oflfeet operand itself or a shifted ofl6et operand provided 
by shift logic 216 (the shift operation performed by the shift logic 216 will be selected in 
dq}endence on the instruction associated with the operands) and to provide that selected 
operand on the path 221. The multiplexer 215 is controUed using the routing signal T 
15 and operates to select between the selected operand provided on the path 221 or an 
inverted representation of the selected operand provided by the inversion logic 210 and 
to provide that operand on the path 245 to the address generation logic 220. 

Hence, instructions which require the generation of an inverse operand and/or a 
shift operation other than the shift operation which can be performed by the address 
20 generation logic 220 cause the routing signals R and/or T to be generated which causes 
the multiplexers 205 and 215 to select the qipropriately manipulated operand. 

Instructions which do not require the generation of a negative operand, or which 
require a shift operation which can be performed by the address generation logic 220, or 
which do not require any shift operation at all cause the routing signals R and T to be 
25 generated w*ich causes the multiplexers 205 and 215 to supply the oflSet operand 
direcfly to address generation logic 220. The decode logic supphes the routing signals R 
and T to the multiplexers 205 and 215 at the appropriate time, to coincide with the 
instruction reaching the address generation stage 200. 

As mentioned above, instructions which require the generation of an inverse 
0 operand (also known as a subtrahend operand; it will be appreciated that in the 
statement: t-u = v. the terms t, u and v are refened to as the minuend, subtrahend and 
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difference operands respectively) cause routing signal T to be generated which causes 
the multiplexer 215 to select the operand which has been routed through the inversion 
logic 210. 

Also, as mentioned above, instmctions which require the generation of a shift 
S operation other than the shift operation which can be performed by the address 
generation logic 220 cause the routing signal R to be generated which causes the 
multiplexer 205 to select the operand which had been routed through the shift logic 216. 
The shift logic 216 receives the operand and generates a shifted operand fiom the 
received operand. The shifted operand may be the operand logically shifted left or right 
10 by any number of bits. In this embodiment, each operand is 32-bits. Accordingly, the 
shift logic 216 is operable to generate a shifted operand which is logically shifted 
between 1 and 31 bits left or right. The decode logic supplies the routing signals R and 
T to the multiplexers 203 and 208 respectively at the ^ropriate time, to coincide with 
the instruction reaching the address generation stage 200. 
15 Instructions which require the gen^tion of an inverse shifted operand cause 

routing signals R and T to be generated which causes firstly a shifted operand to be 
selected, as outline above, and then the inverted representation of the shifted operand to 
be selected. The decode logic supplies the routing signals R and T to the multiplexers 
203 and 208 at the appropriate times, to coincide with the instruction reaching the 
20 address generation stage 220. 

Hence, the address generation logic 220 receives either the original operands 
associated vnth the instruction or, where qjpropxiate, operands which have been 
manipulated by the shift logic 216 and/or the inversion logic 210. With reference to 
Figure 6B, any operand which is to be die subject of a shift operation siqiported by the 
25 address generation logic 220 is provided on the bus 245, the remaining operand is 
provided on die bus 255. 

The operand provided on the bus 245 is provided directly to one input of a two 
input multiplexer 240. The operand provided on the bus 245 is also subject to a 
predetermined logical shift left operation by interconnect logic 260. Whilst in this 
30 embodiment the predetermined shift operation is a logical shift left by 2 bits operation 
(this has been found to be the most frequently-occurring shift operation), it will be 
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appreciated that any particular shift operation could have been selected. The 
interconnect logic 260 is ananged to reorder the bits provided on the bus 245 to effect 
flie logical shift operation and to provide diese reordered bits to tiie second input of the 
multiplexer 240. 

5 Accordingly, it will be ^reciated that where the operand provided on the bus 

245 is, for example, Z, then Z and Zisn#z) are provided to flie multiplexer 240. 
Conversely, whare the operand provided on the bus 245 is, for example, Z , thai Z and 
^isu#2) are provided to the multiplexer 240. 

The multiplexer 240 receives a selection signal S fiom the decode logic, as 
10 will be described in more detail below. The selection signal S is generated in 
dependence on the instruction. Instructions which require flie generation of the 
particular shifted operand cause a selection signal S to be generated with causes the 
multiplexer 240 to simply the ^lifted operand to an adder 250. Instructions which do 
not require the generation of the shifted operand cause a selection signal S to be 
15 generated with causes the inul*i'>l'^vj»r "JAn tn cimnitr tiio re^aitra^ ^^^.^^a ^aa^ 
250. The decode logic siq)plies the selection signal S to the multiplexer 240 at the 
appropriate time, to coincide with the instruction reaching fee address generation stage 
200. 

The adder 250 then combines the opei:ands received over Has buses 255 and 248 
20 to gaierate an address which is provided on a bus 265 for subsequent use by the pipeline 
stages. 

Figure 7 illustrates elements of a decode stage 70' which includes the decode 
logic. The decode stage 70' comprises an immediate generator 270, an instruction 
decoder 280 and a control signal generator 290. 
25 The instruction decodw 280 is arranged to decode instructions and to provide 

information and signals to enable that instruction to be processed by subsequent stages 
in the pipeline. 

On receipt of an instruction which requires the supply of an immediate or 
constant, the instruction decoder 280 activates an immediate generator 270 which 
30 produces the immediate in the required form in paraUel with the operation of the 
instruction decoder 280. It is possible to generate immediates in positive or negative 
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form in parallel with the instruction decoding without increasing the time taken by the 
decode stage 70'. That immediate may then flow through the pipeline with the other 
signals generated by the decode stage 70' or may be provided over a dedicated bus. 

The instruction decoder 280 also when decoding an instruction activates a 
S control signal generator which provides varioiis control signals to subsequent stages of 
the pipeline in depmdence on that instruction. Three such control signals are the 
selection signal S and the routing signals R and T. These signals may flow through 
the pipeline with the other signals generated by the decode stage 70 or may be 
provided over dedicated paths and are timed to coincide with the processing of this 
10 instruction at particular stages in the pipeline. 

When the instruction to be processed involves subtracting a subtrahend operand 
in the form of an immediate from a minuend operand, the immediate generator 270 may 
be arranged to generate a negative or inverse immediate and provide this negative 
immediate to the address generation stage 170. Because the immediate is ahready in 
IS negative form, there is no need to invoke the inversion logic 210 and instead the 
instruction can be dealt with as if it were an additive instruction. Accordingly, the 
control signal generator 290 generates a selection signal S and routing signals R and T 
to control the selection and routing of the operands as if they related to an additive 
instruction. Hence, instructions utilising negative immediates can be treated as additive 
20 instructions and the negative immediate can be routed directly to the address generation 
logic 220. It will be appreciated that this further improves the performance of the 
address generation stage 200. 

It will be appreciated that throug^i flie q>pn>ach described above, tiie address 
generation stage 200 is optimised to handle tiie most frequently encountered operations. 
25 Accordingly, addition operations with or without a particular shift operation are 
processed as quickly as possible, whilst some subtraction operations and/or other shift 
operations require longer to process. Because the subtraction operations and the other 
shift operations occur relatively less frequently than addition operations and the 
particular shift operation, the overall performance of the address generation stage 200 is 
30 significantly improved. 
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Altiiough a particular embodiment has been described herein, it will be 
apparent that the invention is not limited thereto, and that many modifications and 
additions thereto may be made within the scope of the invoition. For example, 
various combinations of the features of the following dependent claims can be made 
5 with the features of the indepoident claims without departing from the scope of the 
present invention. 
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